A set is simply a collection of objects, called its elements or members. The name of a set is typically denoted with an upper case letter (
There are three main ways to represent and define sets.
The *descriptive form* uses words to describe a set. For example, the set $S$ is the set of all odd natural numbers which are less than 12.
The *set-builder form* defines a set by specifying a condition that all of its members satisfies and looks like this:
$$\text{set } = \{\text{ placeholder }|\text{ condition }\}$$
The placeholder is simply there so you can use it to more easily write the condition. The `|` character can be read as "such that". For example, specifying the aforementioned set $S$ using set-builder notation will look like the following.
$$S = \{s|s \text{ is an odd number and } 0 \lt s \lt 12\}$$
The final way to define a set is simply by listing all of its elements or listing enough of them, so that whoever is reading the definition can easily establish the pattern they follow. For example, the aforementioned set will be written as
$$S = \{1,3,5,7,9,11\}$$
To state that an object is a member of a particular set, we notate
- $\emptyset$ - the empty set, which is the set with *no* elements and is considered to be a subset of every set
- $\mathbb{N} = \{0,1,2,3,...\}$ - the set of all natural numbers; some definitions include zero while others do not, here it is included for simplicity
- $\mathbb{N}_0$ - the set of all natural numbers with 0 explicitly included
- $\mathbb{Z} = \{...,-2,-1,0,1,2,...\}$ - the set of all integers
- $\mathbb{Q}$ - the set of all rationals numbers, i.e. numbers which can be represented as the division of two integers
- $\mathbb{R}$ - the set of all real numbers; this is the set of all the rational numbers and all the irrational numbers such as $\pi$ and $e$
The number of elements in a set is called its cardinality and is denoted by
If a set contains more than a single copy of one of its elements, the additional copies are not taken into account. For example, $\{1,2,3,4,1,5,2\}$ is mathematically considered the exact same set as $\{1,2,3,4,5\}$ and so the size of both sets is 5.
The union of two sets, denoted by
The intersection of two sets, denoted by
If the two sets have no elements in common, then their intersection is the empty set $\emptyset$.
The relative complement of a set
A string is a sequence of characters. The set of characters that we can choose from to make our string is called an alphabet and is usually denoted by abcd
, ac
,acd
,c
, etc.
The set of all strings with a certain length
If we wanted to denote the set of all possible strings of any finite length over a given alphabet ab
or aaccdba
.
- the empty string "" which has no characters and can be constructed with any alphabet
- binary strings $\{0,1\}^*$ - strings which only contain 0s and 1s
A function takes an input and produces an output. The inputs of a function are called its arguments and can be different types of objects and so can its output. For example, a function may take in a natural number and a binary string and may output a single bit. The types of the inputs and outputs of the functions are specified by sets in its declaration which has the following syntax:
Consider the following function:
$$F: \{0,1\}^3 \to \{0,1\}$$
We do not know what precisely this function does, but we know that it takes a binary string of length 3 and outputs a single bit - 0 or 1. Similarly, the function
$$F: \mathbb{N} \times \{0,1\}^* \to \{0,1\}^* \times \{0,1\}^*$$
takes in a natural number and a binary string of any length and outputs two binary strings of arbitrary length, too. An example of such a function would a function which splits a given binary string at the position indicated by the natural number and returns the two split parts.
The input sets are called the function's domain and codomain.
A function definition describes what the function outputs given a particular input and has the syntax
The expression can be a mathematical formula, it can be a sentence explaining what the function does, or it can be a mixture of both.
The function $f: \mathbb{R} \to \mathbb{R}$ which returns the square of its input would be defined as follows:
$$f(x) \coloneqq x^2$$
The $x$ is just an arbitrary placeholder for the argument - we could have very well used $y$ or a word or anything we would like.
The function $PALINDROME: \{0,1\}^* \to \{0,1\}$ is the function which outputs 1 if its input is a palindrome string and outputs 0 otherwise. This was an example of a definition with a sentence.
Functions can also be piecewise-defined. This is when the function does different things depending on whether its input satisfies a given condition. For example, the
The absolute value function
Finally, a function can be specified by a table listing all its inputs and their corresponding outputs. For example,
0 | 4 |
2 | 17 |
3 | 1 |
4 | 26 |
... | ... |
This does not give us a very good idea of what the function
A function need not be defined for all values in its domain. For example, the division function $DIV: \mathbb{R} \times \mathbb{R} \to \mathbb{R}$, or alternatively $DIV(a,b) = \frac{a}{b}$, is not defined for $b = 0$ because one cannot divide by 0. Such functions are called *partial* and the set of all values for which the function is actually defined is called its *natural domain*. This can be seen from the following diagram for a function $f: X \to Y$:
![](Resources/Images/Function%20Domain.svg)
The domain is $X$, while the natural domain is $\{x_1,x_2,x_3,...\} \subset X$. A function which is defined for *all* values in its domain is called a *total* function.
These are terms which describe the relationship a function establishes between its input sets and its output sets.
An *injective* function, or one-to-one function, is a function which given two different inputs, will always produce two different outputs - every element in its input sets is mapped to a single element from its output sets. An example of such a function is $f(x) = x + 1$ - there are no two inputs $x \ne x'$ for which $f(x) = f(x')$. However, the function $g(x) = x^2$ is *not* an injection because opposite numbers produce the same output, i.e. $g(2) = g(-2) = 4$.
A *surjective* function is a function which covers its entire codomain. For example, $f: \mathbb{R} \to \mathbb{R}$ with $f(x) = x + 1$ is a surjection because every real number can be produced from it, i.e. for every $y \in \mathbb{R}$ there is at least one number $x$ such that $f(x) = y$. Contrastingly, the absolute value function $||: \mathbb{R} \to \mathbb{R}$ is *not* surjective because it cannot produce negative values. The subset of the codomain which contains all values which can be obtained from the function is called the function's *image*.
A *bijective* function, also known as a one-to-one map or one-to-one correspondence, is a function which is both surjective and injective, i.e. it covers its entire codomain and assigns to every element it in it, only *one* element from its natural domain.
![](Resources/Images/Injection,%20Surjection,%20Bijection.svg)
There are a few functions used extensively throughout cryptography and computer science. Although they are defined on single bits, every one of them can be extended to binary strings simply by applying the function on a bit-by-bit basis.
The
a | NOT(a) |
---|---|
0 | 1 |
1 | 0 |
The function $NOT(a)$ can also be written as $\neg a$.
The
a | b | AND(a,b) |
---|---|---|
0 | 0 | 0 |
0 | 1 | 0 |
1 | 0 | 0 |
1 | 1 | 1 |
The function $AND(a,b)$ can also be written as $a \land b$.
The
a | b | OR(a,b) |
---|---|---|
0 | 0 | 0 |
0 | 1 | 1 |
1 | 0 | 1 |
1 | 1 | 1 |
The function $OR(a,b)$ can also be written as $a \lor b$.
The eXclusive OR function,
a | b | XOR(a,b) |
---|---|---|
0 | 0 | 0 |
0 | 1 | 1 |
1 | 0 | 1 |
1 | 1 | 0 |
The function $XOR(a,b)$ can also be written as $a \oplus b$.
This function is ubiquitous in cryptography due to its four essential properties:
Property | Formula |
---|---|
Commutativity | |
Associativity | |
Identity | |
Involution |
Commutativity means that the two inputs can change places and the output would still be the same. Associativity means that, given a chain of XOR operations, the order in which they are executed is irrelevant to the final result. Identity indicates that there is a specific input, called the identity element, for which the XOR operation simply outputs the other input.
Involution is a fancy way of saying that XOR is its own inverse operation. Given the output of a XOR operation and one of its inputs, the other input can be obtained by XOR-ing the output with the known input.
Another interesting property of $XOR$ is that XOR-ing a bit with itself will always produce 0. This is often used in computers to reset a [register](../Reverse%20Engineering/Assembly%20Programming/x86-64/Registers.md) to 0.
A function $\mu :\mathbb{N} \to [0,1]$ is *negligible* if for every polynomial $p: \mathbb{N} \to \mathbb{N}$ there exists a number $N \in \mathbb{N}$ such that $\mu(n) \lt \frac{1}{p(n)}$ for every $n \gt N$.
The definition itself is not that important, just remember that a negligible function approaches 0 and it does so quickly as its input approaches infinity.
Essentially, a function is negligble if it approaches 0 as its input becomes larger and larger. That is, no matter how big a polynomial one can think of, after some input $N$ the function will always be smaller than the reciprocal of the polynomial.
The reason the function outputs a number between 0 and 1 is that such functions are usually used in the context of probabilities (as is the case here).
The reason we want the negligible function to get smaller and smaller as its input gets larger and larger is because we are using the key length
When we perform an experiment such as tossing a fair coin, we obtain a certain result from it called its outcome.
The *outcome* of an experiment is all the information about the experiment after it has been carried out.
For the experiment of the coin toss, the outcome is simply the coin's face after the toss and will be either heads (
The *sample space* of an experiment is the set of *all* possible outcomes from the experiment.
Consider the experiment of tossing a coin three times. Its sample space is $\{HHH,HHT,HTH,HTT,THH,THT,TTH,TTT\}$, or equivalently $\{000,001,010,011,100,101,110,111\}$ if we encoded "heads" with $0$ and "tails" with $1$.
Each outcome can be associated with a number, called its probability, which describes how likely this outcome is. However, not all outcomes in the sample space need to have the same probability. Suppose that our coin was "rigged" (maybe it weighed more on one side) and actually was more inclined to result in heads rather than tails. Then, if the coin was tossed three times, the outcome
A *probability space* is a sample space $S$ with a total function $\Pr: S \to [0,1]$ such that
$$\sum_{s \in S} \Pr(s) = 1$$
The function $\Pr$ is called a *probability function* over the sample space $S$.
The probability function $\Pr$ assigns to each possible outcome a probability value between 0 and 1. The sum of all the probabilities must be one because *some* outcome is guaranteed to happen. If the probabilities did not sum up to one, then there would be a chance that the experiment resulted in an outcome outside its sample space, which is impossible, since the sample space is the set of *all* possible outcomes.
If all outcomes from the experiment are equally likely, then they have the same probability and the probability of every outcome
When this is the case, the probability function
An event
The probability of this event occurring (i.e. getting one of its elements as an outcome), denoted by
When the sample space is understood from context, this can be simply written as
If we wanted to describe the event that we get "tails" an even number of times from the three coin tosses, then we would do it as $E = \{s: s \text{ has an an even number of "tails"}\} = \{HHH, HTT, THT, TTH\}$. The probability of this event is the sum of the probabilities of its outcomes. We assumed a fair coin, so each outcome in the sample space $S$ has the same probability $P(s) = \frac{1}{|S|}$. Then,
$$\Pr[E] = \sum_{s \in E} \Pr(s) = 4\times \frac{1}{|S|}$$
The total number of outcomes, $|S|$, is eight as we saw earlier, so
$$\Pr[E] = \frac{4}{8} = \frac{1}{2}$$
For an event
Given two possible events
A random variable (which is a terrible misnomer, but again, mathematicians...) is a way to assign a number to every outcome in the sample space
Consider the experiment of rolling a fair die three times. Each roll has six possible outcomes - $\{1,2,3,4,5,6\}$ - and there are three rolls, so the sample space is $\{1,2,3,4,5,6\}^3$. One possible random variable for this experiment would be the sum $\textit{SUM}: \{1,2,3,4,5,6\}^n \to \mathbb{R}$ of the points from the three rolls ($n = 3$ in this case).
In fact, we have already seen another possible random variable which can be defined for every sample space - that's right, probability! Since the probability function
The expectation value of a random variable
The expectation value is calculated by summing all the values of the random variable for the outcomes in the sample space and then dividing it by the total number of outcomes.
For the previous example where $\textit{SUM}: \{1,2,3,4,5,6\}^n \to \mathbb{R}$ was the random variable which for each outcome was equal to the sum of the three rolls, the expectation value can be calculated as follows:
$$\mathbb{E}[\textit{SUM}] = \sum_{x\in\{1,2,3,4,5,6\}^3} \frac{SUM(x)}{|\{1,2,3,4,5,6\}^3|} = \frac{2268}{216} = 10.5$$
Of course, calculating this by summing up all the numbers for every outcome is tedious, but it can be circumvented using some properties of expectation.
There are two properties of the expectation value that one should be aware of.
For every two random variables $X$ and $Y$ over the same sample space $S$, the expectation value of their sum (which is itself a random variable defined as $X(x) + Y(x)$ for every $x\in S$) is equal to sum of the expectation values of $X$ and $Y$.
$$\mathbb{E}[X + Y] = \mathbb{E}[X] + \mathbb{E}[Y]$$
Similarly, for every random variable $X$ and constant $k \in \mathbb{R}$, the expectation value of $k$ multiplied by $X$ is equal to $k$ multiplied by the expectation value of $X$.
$$\mathbb{E}[kX] = k\mathbb{E}[X]$$
For the sum part,
$$\mathbb{E}[X + Y] = \sum_{x\in S} \frac{X(x) + Y(x)}{|S|} = \sum_{x\in S} \frac{X(x)}{|S|} + \sum_{x\in S} \frac{Y(x)}{|S|} = \mathbb{E}[X] + \mathbb{E}[Y]$$
For the multiplication by a constant part,
$$\mathbb{E}[kX] = \sum_{x\in S} \frac{k\times X(x)}{|S|} = k\times \sum_{x\in S} \frac{X(x)}{|S|} = k \mathbb{E}[X]$$
Linearity can be used to calculate the expectation of the $\textit{SUM}$ random variable which we defined for the experiment of rolling a dice three times. For each separate roll the sum is just the number of points on the die's face and the sum of three rolls is just the sum of the points from the three rolls which can also be said as "the sum of the three rolls is the sum of the sums of each separate roll". This allows us to use linearity.
If we denoted the number of points from the first, second and third roll with $a,b,c$, respectively, then the final outcome will be written as $abc$ (this is concatenation, not multiplication) and we have, by linearity,
$$\begin{align}\mathbb{E}[\textit{SUM}(abc)] &= \mathbb{E}[\textit{SUM}(a)+\textit{SUM}(b)+\textit{SUM}(c)] \\ &= \mathbb{E}[\textit{SUM}(a)] + \mathbb{E}[\textit{SUM}(b)] + \mathbb{E}[\textit{SUM}(c)] \\ &= \sum_{x\in\{1,2,3,4,5,6\}} \frac{\textit{SUM}(x)}{6} + \sum_{x\in\{1,2,3,4,5,6\}} \frac{\textit{SUM}(x)}{6} + \sum_{x\in\{1,2,3,4,5,6\}} \frac{\textit{SUM}(x)}{6} \\ &= 3 \times \sum_{x\in\{1,2,3,4,5,6\}} \frac{\textit{SUM}(x)}{6} \\ &= 3\times \frac{1+2+3+4+5+6}{6} \\ &= 3\times 3.5 = 10.5\end{align}$$
Random variables (which output only real numbers) are a special case of all total surjective functions
A *probability distribution* over a finite set $O$ is a total *probability function* $\Pr$ such that
$$\sum_{o \in O} \Pr(o) = 1$$
This definition is quite broad and does not even mention the function $f$. This is because a probability distribution is just a way to assign a probability value to every member of a set $O$.
When we say that we "choose a random member from a set $O$" according to some distribution $\mathcal{D}$, we simply mean that the probability of choosing a particular $o \in O$ is equal to $\Pr(o)$.
The requirement that the sum of all the probabilities is equal to 1 is very intuitive - we are choosing from a finite set $O$, so we must get *some* member of it.
This is all great, but how do we know what probability
In this way, it makes some sense to call this a *probability function* because $\Pr(o)$ tells us how likely it is that $f$ outputs $o$ when choosing an $s \in S$ uniformly at random.